365 research outputs found
Statistical methods for tissue array images - algorithmic scoring and co-training
Recent advances in tissue microarray technology have allowed
immunohistochemistry to become a powerful medium-to-high throughput analysis
tool, particularly for the validation of diagnostic and prognostic biomarkers.
However, as study size grows, the manual evaluation of these assays becomes a
prohibitive limitation; it vastly reduces throughput and greatly increases
variability and expense. We propose an algorithm - Tissue Array Co-Occurrence
Matrix Analysis (TACOMA) - for quantifying cellular phenotypes based on
textural regularity summarized by local inter-pixel relationships. The
algorithm can be easily trained for any staining pattern, is absent of
sensitive tuning parameters and has the ability to report salient pixels in an
image that contribute to its score. Pathologists' input via informative
training patches is an important aspect of the algorithm that allows the
training for any specific marker or cell type. With co-training, the error rate
of TACOMA can be reduced substantially for a very small training sample (e.g.,
with size 30). We give theoretical insights into the success of co-training via
thinning of the feature set in a high-dimensional setting when there is
"sufficient" redundancy among the features. TACOMA is flexible, transparent and
provides a scoring process that can be evaluated with clarity and confidence.
In a study based on an estrogen receptor (ER) marker, we show that TACOMA is
comparable to, or outperforms, pathologists' performance in terms of accuracy
and repeatability.Comment: Published in at http://dx.doi.org/10.1214/12-AOAS543 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
K-nearest Neighbor Search by Random Projection Forests
K-nearest neighbor (kNN) search has wide applications in many areas,
including data mining, machine learning, statistics and many applied domains.
Inspired by the success of ensemble methods and the flexibility of tree-based
methodology, we propose random projection forests (rpForests), for kNN search.
rpForests finds kNNs by aggregating results from an ensemble of random
projection trees with each constructed recursively through a series of
carefully chosen random projections. rpForests achieves a remarkable accuracy
in terms of fast decay in the missing rate of kNNs and that of discrepancy in
the kNN distances. rpForests has a very low computational complexity. The
ensemble nature of rpForests makes it easily run in parallel on multicore or
clustered computers; the running time is expected to be nearly inversely
proportional to the number of cores or machines. We give theoretical insights
by showing the exponential decay of the probability that neighboring points
would be separated by ensemble random projection trees when the ensemble size
increases. Our theory can be used to refine the choice of random projections in
the growth of trees, and experiments show that the effect is remarkable.Comment: 15 pages, 4 figures, 2018 IEEE Big Data Conferenc
- …